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these problems. Cost can be decreased through innovative sampling approaches . 
The translation of survey instruments should incorporate and expand on several 
principles, which include the testing of translations with focus groups of 
monolingual speakers from the target research group. Researchers should build 
time for translations when designing and planning studies, and the increasing 
sophistication of machine technology can reduce the amount of time required 
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Executive Summary 

In August 2000, President Clinton issued an Executive Order requiring all federally 
assisted programs to provide access for persons with limited English proficiency. This 
order highlighted the need to consider language issues in the design and execution of 
federal, state and local service programs. Concurrently, it stimulated awareness of the 
need for scientifically reliable data on the prevalence of English proficiency and the 
steps needed to overcome existing barriers to collecting such information. 

Individuals in the United States who do not speak English well (referred to 
as language-minority individuals) represent a major challenge for health and social 
service agencies, educators, policy planners, and researchers. Although only about 
3 percent of the U.S. population aged 5 and over speak English poorly or not at 
all, the proportion varies substantially by age, nativity, education, and other factors. 
Demographers and other social scientists usually use large-scale household surveys, 
based on probability sampling, to collect data that accurately represent the characteristics 
of the U.S. population as a whole. Most surveys limit their interviewing to English or 
English and Spanish, and respondents must have a relatively high level of proficiency 
in that language. 

If, as expected, the proportion of language-minority individuals in the population 
increases over time, the representativeness of national samples is increasingly compro- 
mised. Indeed, population research based on what are purportedly nationally repre- 
sentative surveys very often will overlook those immigrants likely to be the most 
vulnerable. Since lack of language ability is often a barrier to accessing health care 
and other social services, the inability to speak English well may contribute to dis- 
parities in health outcomes. 

In view of strong national commitments to (1) improving the inclusion of 
minorities in clinical trials; (2) reducing health disparities among subpopulations; 
and (3) developing cultural competence in health service delivery, researchers and 
policy makers should give added attention to language as a potential barrier excluding 
people from national surveys, as well as from access to health care and social services. 
To help find ways for survey research to capture the increasing linguistic diversity of 
the United States and hence be truly nationally representative, this report focused on 
current barriers to inclusion as well as ways to enable inclusion. 



Barriers to Inclusion 

A recurring theme throughout this report is that cost is the most significant barrier to inclu- 
ding language-minority populations in national studies. Four necessary but expensive tasks were 
identified: (1) sampling to get sufficient numbers of subjects who do not speak English well; 

(2) translating or developing survey instruments (including the concomitant costs of vetting 
the translation, conducting focus groups, and/or piloting surveys); (3) recruiting, hiring, and 
training bilingual interviewers; and (4) contacting and interviewing subjects who live in rural 
or geographically diverse locations. 

The geographic distribution of minority language populations may be a significant 
barrier to their inclusion in national studies. Language-minority individuals are often difficult 
to include in studies either because they are clustered in small, possibly remote areas, or because 
they are not concentrated in any particular area. Cost-effective sampling strategies based on 
geographic location therefore often cannot be used. 

Language change over time is a barrier to inclusion of language-minority groups in 
research. The version of language spoken by recent immigrants often differs significantly 
from that of individuals who immigrated several years ago. And, among long-term immi- 
grants, those who live in isolated communities develop different dialects from those who 
routinely interact with English speakers. 

Lack of coherence with other research goals presents a barrier. Addressing specific 
language groups may not be well-integrated into a project’s major research focus, and may 
therefore seem an ad hoc, add-on component that does not fit well with the overall research 
goals and design. 

Use of community members as translators/interpreters may be a barrier. 

While the use of local translators and interpreters can sometimes improve survey coverage, 
their use also may be a barrier with regard to issues of confidentiality or culturally sensitive 
topics that respondents are uncomfortable with or reluctant to openly discuss with someone 
from their own community. Similarly, someone from the local community (either the current 
community or the community of origin of an immigrant) may invoke the class structure of 
the culture of origin, which can interfere with the goals of the research. 

Enabling Inclusion 

The challenges of including language-minority populations in national surveys and studies 
are not new, and many underutilized resources are already at hand. In addition, there are 
new technologies and potential solutions on the horizon. It is possible to decrease cost 
through innovative sampling approaches, rather than screening the general population. 

For example, researchers can identify subjects through pre-existing lists based on adminis- 
trative records (e.g., birth registries, INS records. Medicare records). Other strategies 
include using telephone interviews to conduct preliminary screenings, and cumulating 
data from repeated surveys in order to increase sample sizes. 
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Instrument translation should incorporate and expand on several important practices. 
Translation should be done by professional bilingual translators, and the translations should 
be vetted (judged as to linguistic and cultural appropriateness) by monolingual speakers of 
the target language. Translated or parallel instruments should undergo cognitive testing to 
determine that they test/query the same concepts. Researchers should allow translation into 
Anglicized dialects. The retention/inclusion of English terms in the translated instrument 
is important when a concept does not exist in the target language and culture. Translations 
should also be tested in focus groups of monolingual speakers from or typical of the target 
research group, and should be piloted whenever possible. 

Researchers should build in time for translations when designing and planning 
studies. The English version of an instrument should be completed before beginning its 
translation, and there must be time to translate, evaluate, and test the translated version 
prior to the initiation of actual data collection in either language. Alternatively, researchers 
could develop (or contract development of) a parallel, culturally appropriate instrument 
simultaneously with the English language instrument, or lagged behind the English version 
but overlapping in timing. 

The rapidly expanding sophistication of machine technology can reduce the 
amount of time required for professional translators by allowing them to refine and 
correct translations rather than shoulder the entire translation burden. Although not 
applicable in all cases, some research should benefit from using one or more of the three 
major types of machine translation currently in use — knowledge-based, corpus-based, 
and human-in-the-loop. 

In order to complement and inform future activities, researchers should ensure 
that they make optimal use of existing knowledge by building on the work of others 
and collaborating across disciplines. Researchers should: 

> Gather and share the experience of international organizations that already 
have multilingual survey experience (e.g., United Nations, Organisation for 
Economic Co-operation and Development, World Bank, Demographic 
Health Surveys, World Health Organization). 

> Archive translations and source texts to share and to combine with those 
of colleagues for potential use in machine translation memory databases. 

> Use existing survey instruments as a starting point whenever feasible. 

For example, a survey from another country, already written in the 
language of that country, might require refinements to accommodate 
cultural adaptations that have taken place since a group emigrated, 
but could provide a basis to build on. 





(Diverse voices moke sweet music; 
os diverse conditions in our life 
render sweet ho rmony . . .) 

Dante, Paradiso IV: 1 24-1 26 
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Introduction 



By most estimates, the non-English speaking population in the United States is increasing. 
Growing levels of immigration are the single most important factor contributing to the 
overall size and internal diversity of the language-minority population. Because it is difficult 
to incorporate those who do not speak English or do not speak English well into national 
studies and surveys, this group is often not included in these research efforts. If the growing 
language-minority populations are excluded by default, the representativeness of national 
samples will become increasingly compromised. Therefore, it is prudent to pay attention to 
language as a potential barrier for inclusion in national surveys and studies, or for access to 
health care and social services. But because we know there are substantial costs associated 
with developing and validating research instruments in multiple languages, and in adminis- 
tering surveys in multiple languages, most studies limit their subjects to English speakers, 
or English and Spanish speakers. This would suggest that population research based on 
what are purportedly nationally representative surveys very often will overlook the pre- 
sumably most vulnerable populations — those who don’t speak English well. Since language 
ability is often a barrier to accessing health care and other social services, the inability to 
speak English well may contribute to disparities in health outcomes. 

A number of questions and issues arise. Does a specific research initiative require the 
inclusion of certain groups and not others? Given that certain subpopulations have been 
defined as essential targets of study, which techniques are most likely to yield a statistically 
valid sample that accurately represents their characteristics? What are the barriers to inclu- 
sion of language-minority populations, and what can we do to improve the situation? 

How are current data collection efforts tackling language issues? These are preliminary 
questions in a research process that will require information from respondents who often 
do not share the language or culture of those conducting the research. In national data- 
collection initiatives, developing effective methodologies for establishing communication 
in the field is as vital to the success of efforts to include language-minority subpopulations 
as is the use of innovative sampling techniques. 

To address these questions, representatives from the National Institute on Aging and 
the National Institute of Child Health and Human Development, with funding assistance 



from the NIH Office of Research on Minority Health, convened a workshop on the Inclusion 
of Language-Minority Populations in National Studies. This report provides the outcome 
of that workshop, held on the NIH campus in Bethesda, MD, on July 27-28, 2000. 

In keeping with the multidisciplinary approaches fostered by both Institutes, partici- 
pants in the workshop included demographers, statisticians, sociologists, psychologists, 
linguists, anthropologists, experts in emerging computerized translation technologies, 
representatives of major private survey organizations and translation agencies, opinion 
leaders, and representatives of Federal agencies (including the U.S. Census Bureau, 
the Centers for Disease Control, and the Office of Management and Budget, as well as 
the NIH and other entities within the Department of Health and Human Services). 

On August 1 1, 2000, shortly after the meeting, then-President Clinton issued an 
Executive Order requiring all federally assisted programs to provide access for persons 
with limited English proficiency. This order highlighted the importance of language 
issues, and stimulated awareness of the need for and importance of scientifically reliable 
data that include individuals who speak little or no English. Without information on 
language-minority populations, it is impossible to assess their needs and access to vari- 
ous forms of assistance, including health care. As the text of the Executive Order states 
emphatically, equal access to federally sponsored programs is a basic civil right, regard- 
less of whether an individual is a fluent English speaker. 

Describing the Language-Minority Population 

In 1990, almost 32 million individuals five years of age and older — 13.8 percent of the 
United States population within this age bracket, or one out of every seven people — 
spoke a language other than English at home. While a majority (79 percent) reported 
that they possessed functional levels of English proficiency, more than 6.3 million 
revealed that they either did not speak English well or could not speak the language 
at all (U.S. Census Bureau, 1999). Because of continued high rates of immigration, 
these numbers likely have increased during the past decade. The magnitude of the 
continuing transformation of the linguistic profile of the American population will 
become more apparent when data collected during the 2000 Census become available. 1 

People aged 5 years and older who do not speak or understand English very well 
are referred to henceforth as language-minority individuals. The language-minority 
population is heterogeneous, stratified racially, culturally, socially, and linguistically. 

Patterns of geographic dispersion and large average household size of many language- 
minority subpopulations, including many Hispanic subgroups, often preclude the use 
of conventional, area-based household sampling procedures to capture language-minority 
populations. For equal numbers of minority and non-minority individuals within a given 



1 On August 6, 2001 the Census Bureau released Census 2000 Supplementary Survey Summary Tables 
for the U.S. documenting that 43 million individuals five years of age and older, which represent 
17.6 percent of persons in this age group, speak a language other than English at home, up from 
the 13.8% reported in 1990. Of these 45 million individuals, more than 10.3 million or nearly a 
quarter, either speak English “not well” or “not at all.” 
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primary sampling unit (PSU), fewer minority individuals will be registered by surveys 
because they occupy a smaller number of discrete household dwellings than do their 
non-minority counterparts. Disproportionate numbers of minority populations also 
live in group-based housing or in institutionalized settings, another factor that reduces 
their selection rates in conventional household-based surveys. 

Compared with the general population, language-minority subpopulations contain 
disproportionately high numbers of vulnerable members of our society, including adults 
and children living in or near poverty, the less educated, and the elderly. Recent studies 
have demonstrated that individuals with low levels of English proficiency or who are 
linguistically isolated 2 often have greater than usual difficulties gaining access to medical 
care and other social services than do English speakers (Young et al., 1987; Zahn, 1999; 
Phua and McNally, 1999). A lack of proficiency in English may contribute to the 
disparities in health outcomes among some minority groups. 

Extant Demographic Data on U.S. Linguistic Diversity 

Data from the 1990 Census, although neither complete nor unambiguous, offer a valu- 
able portrait of a key aspect of American linguistic diversity: the broad array of languages 
spoken by persons residing in the United States. The Census provides the relative distri- 
butions of languages over several major demographic categories (e.g., age and nativity) 
as well as information on the characteristics of language-minority speakers that may 
significantly influence aspects of survey design (particularly field protocols), such as 
relative levels of English proficiency, education, socioeconomic status, and the pro- 
portions of linguistically isolated households. 

In the 1990 Census, Spanish speakers accounted for 54 percent of the 32 million 
individuals who reported speaking a language other than English at home 3 and slightly 
more than two- thirds of the language minority population, i.e., those who speak English 
“not well” or “not at all” (Table l). 4 Among the remaining 15 million people who spoke 
a language other than English at home, no other single language dominated. The Census 
Bureau reported 380 languages and dialects spoken by respondents who spoke a language 
other than English at home. After Spanish, the next nine most frequently spoken languages 
were, in order of frequency: French, German, Italian, Chinese, Tagalog, Polish, Korean, 
Vietnamese, and Portuguese. Besides Spanish, only French, German, Italian, and Chinese 



2 The Census Bureau defines a linguistically isolated household as one in which no person over the age 
of 14 speaks only English or speaks the language “very well.” 

3 As a result of their relative numbers, Spanish speakers have proven easier than other groups to include 
in national surveys and studies. 

4 Among the elderly aged 63 and over who did not speak English well or at all in 1990, about 45 percent 
spoke Spanish, about 8 percent spoke Chinese, and about 7 percent spoke Italian (special tabulation by 
J. McNally of the Census PUMS 1/1000 file). On August 6, 2001 the Census Bureau released new 2000 
Supplemental Survey Tables with updated estimates showing that Spanish speakers now account for 

60 percent of the 45 million individuals who reported speaking a language other than English at home. 
Among the elderly aged 65 and over who did not speak English well or at all in 2000, about 50 percent 
spoke Spanish. 
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Table 1 


People Who Speak a Non-English Language at Home and Who Speak English 
"Not Well" or "Not at All" 

Speak Speak English 

Non-English "Not Well" or Percentage Cumulative 

Language at "Not at All" of all Percentage 

Home (NELP) NELP of NELP 


Spanish 


17,339,172 


4,500,973 


67.46 


67.46 


Chinese 


1,249,213 


373,216 


5.59 


73.05 


Korean 


626,478 


188,419 


2.82 


75.88 


French 


1,702,176 


157,724 


2.36 


78.24 


Italian 


1,308,648 


151,262 


2.27 


80.51 


Vietnamese 


507,069 


143,173 


2.15 


82.65 


German 


1,547,099 


101,163 


1.52 


84.17 


Polish 


723,483 


98,384 


1.47 


85.64 


Portuguese 


429,860 


98,334 


1.47 


87.12 


Japanese 


427,657 


91,096 


1.37 


88.48 


Russian 


241,798 


65,304 


0.98 


89.46 


Tagalog 


843,251 


63,028 


0.94 


90.41 


Thai (Laotian) 


206,266 


57,843 


0.87 


91.27 


Mon-Khmer (Cambodian) 


127,441 


54,663 


0.82 


92.09 


Greek 


388,260 


44,035 


0.66 


92.75 


French Creole 


187,658 


41,872 


0.63 


93.38 


Armenian 


149,694 


38,700 


0.58 


93.96 


Hmong 


81,877 


37,904 


0.57 


94.53 


Arabic 


355,150 


37,492 


0.56 


95.09 


Hindi (Urdu) 


331,484 


29,503 


0.44 


95.53 


Persian 


201,865 


25,213 


0.38 


95.91 


Navajo 


148,530 


21,788 


0.33 


96.24 


Yiddish 


213,064 


17,474 


0.26 


96.50 


Hungarian 


147,902 


13,827 


0.21 


96.71 


Ukrainian 


96,568 


13,104 


0.20 


96.90 


Gujarathi 


102,418 


12,057 


0.18 


97.08 


Rumanian 


65,265 


11,381 


0.17 


97.25 


Formosan 


46,044 


9,691 


0.15 


97.40 


Serbo-Croatian 


70,964 


9,512 


0.14 


97.54 


Ilocano 


41,131 


8,164 


0.12 


97.66 


Panjabi 


50,005 


7,720 


0.12 


97.78 


Hebrew 


144,292 


7,167 


0.11 


97.89 


Dutch 


142,684 


5,860 


0.09 


97.97 


Slovak 


80,388 


5,755 


0.09 


98.06 


Czech 


92,485 


5,714 


0.09 


98.15 


Turkish 


41,876 


5,677 


0.09 


98.23 


Syriac 


35,146 


5,404 


0.08 


98.31 


Lithuanian 


55,781 


5,076 


0.08 


98.39 


Other 


1,294,837 


107,529 


1.61 


100.00 


Total 


31,844,979 


6,672,201 


100.00 


— 



Notes: The second column (NELP) shows the number of people in the first column who do not have English 

language proficiency. Figures in the third column represent the proportion of all NELP (i.e., total U.S. 
NELP population) in each language category. 

Source: Data from the 1990 Census as compiled by Stevens, 2000. 
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represent linguistic categories that included more than 1 million speakers, and no language 
except Spanish was spoken by more than 1 percent of the total U.S. population. The Bureau 
has generated a detailed demographic breakdown of the 50 non-English languages or language 
families, such as Chinese andTagalog, most frequently spoken by U.S. residents (U.S. Census 
Bureau, 1999). 

Levels of English Proficiency 

Levels of English proficiency differ sharply among subpopulations that speak different 
languages because these groups differ by age, nativity, duration of residence in the 
United States, and level of education. The Spanish-speaking subpopulation contains 
the largest proportion of persons who are not proficient in English; in 1990, approxi- 
mately 4.5 million (more than 25 percent) of the country’s 17 million Spanish-speakers 
either did not speak English “very well” or did not speak it at all. Other subpopula- 
tions with significant proportions of individuals with relatively low levels of English 
proficiency include those speaking Asian languages (e.g., Chinese, Korean, Vietnamese, 

Thai, Cambodian, and Hmong 5 ) as well as speakers of Portuguese, Russian, French 
Creole and Armenian. 

English language ability is related to nativity. Among the population who spoke 
a language other than English at home in 1990, 70 percent of U.S. -born individuals 
spoke English “very well,” compared with only 41 percent of foreign-born individuals. 

English proficiency also varies markedly by age. It has been shown that immigrants who 
arrive in the United States as young children are almost certain to be proficient English 
speakers when they are adults, although the effects of age at immigration on English 
proficiency may be tempered by factors such as family background, educational history, 
and current familial characteristics (Stevens, 1999). In households where a non-English 
language was spoken in 1990, more than 62 percent of children between the ages of 
5 and 17 spoke English “very well,” as opposed to 53 percent of persons aged 65 
and over. 

In the general population, those who do not speak English or speak it poorly often 
have low levels of education; a small fraction have such low levels of formal education that 
they may not be functionally literate in their native language. The Census does not collect 
data on levels of literacy, a significant omission since many surveys depend on the use of 



5 The case of the Hmong illustrates the variety of paths that language-minority groups may take to 
the United States, as well as the diversity of educational/literacy skills among such groups. During 
the Vietnam conflict, the U.S. Central Intelligence Agency (CIA) forged alliances with many Lao 
ethnic groups, the most visible of which was with the Hmong (Hannah 1987). After 1975, the Hmong 
faced severe retaliation and many emigrated to the United States as refugees, settling primarily in 
California, Minnesota, and Wisconsin (Wain, 1981; Knoll, 1982; Duchon, 1997). In the United 
States, the Hmong have experienced high rates of unemployment, high levels of welfare use, low 
rates of literacy, and relatively low levels of fluency in English (see, e.g., Downing, 1986; Portes and 
Rumbaut, 1990). Many of the problems the Hmong have faced in the United States have been due 
to low levels of literacy in their own language. Few Hmong have more than a few years of formal 
schooling in Laos, and many, especially women, received no schooling at all (Duchon, 1997). 

1 



a written instrument. Many of those with low English proficiency are poor; more than 
half of these individuals are in poverty or near poverty. It is important to note that lack 
of proficiency in English is not necessarily associated with poverty in the United States. 

For example, among Hmong and Navajo who are proficient in English, large numbers 
remain poor; on the other hand, very few non-English speakers of Japanese, Tagalog, Hindi, 
Italian, Portuguese, Greek, or Gujarati are poor. 

Language-minority subpopulations characterized by low levels of education and high 
poverty also tend to display relatively high rates of linguistic isolation, as do subpopulations 
with relatively high proportions of foreign-born, especially recent immigrants and the elderly. 
Twenty-six percent of elderly who speak a non-English language at home are linguistically 
isolated. After Spanish, the language groups with the highest proportions of linguistically 
isolated households include Chinese, Vietnamese, Korean, Cambodian, Thai, and Hmong. 
Many of these economically disadvantaged language-minority households contain no one 
over the age of 14 who speaks English fluently. This is problematic for surveys since there 
is no one in the household who can act as a translator or proxy. The language-minority 
subpopulations who are poor, relatively uneducated, linguistically isolated, elderly and 
have low levels of English proficiency may be precisely the groups that are likely to derive 
the greatest social benefits from having their characteristics and requirements fully docu- 
mented through national-scale surveys. 

Geographic Distribution of Language-Minority Populations 

The five most populous states — California, New York, Texas, Florida, and Pennsylvania — 
contain about 60 percent of the language-minority population, but less than 40 percent of 
the total U.S. population. Within these states, the language-minority population tends to 
be concentrated in major urban centers. 

• Despite the relatively high concentration of language-minority speakers on the coasts 
and in the Southwest, there are distinct differences in geographic distribution among ethnic 
subpopulations. Hispanics are concentrated in a few states, tend to be urbanized, and tend 
to represent a sizable proportion of the population in the areas where they reside. This makes 
them relatively easy to capture in a nationally representative survey, as they will fall into a 
normal sampling frame. This also helps to explain recent successes in efforts to improve the 
representation of Spanish-speaking groups in national surveys. Unlike Hispanics, Asians tend 
to be more thinly dispersed throughout the country, typically representing 2 percent or less 
of the population of most states. The main exceptions are Hawaii and California, where 
Asians who speak non-English languages constitute 23 percent and 10 percent of the total 
state populations, respectively. It is extremely difficult to obtain nation-wide representative 
samples of small populations characterized by such geographic dispersion. While oversampling 
high-concentration strata through multi-staged stratified area sampling designs to enhance 
minority representation in national-level surveys is effective and cost-efficient for groups that 
tend to cluster (e.g., African Americans and Hispanics), such sampling designs are less 
effective when applied to small populations that are geographically dispersed (Santos, 

1996; OMH, 1999). 




10 



17 



Challenges for Including Language-Minority Populations in Surveys 
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People who cannot communicate in English are routinely underrepresented in national 
surveys. Actual rates of non-coverage due to language barriers remain uncertain because 
1) survey organizations do not routinely collect information on the number of individuals 
excluded from the sample because of language difficulties and 2) relatively rare populations 
are difficult to systematically include within sampling frames that are designed to obtain 
a national picture. However, failing to develop methods to measure the characteristics of 
language-minority subpopulations will progressively compromise the scientific quality of 
data collected through such surveys as the size (and possibly the national proportion) of 
these groups grows. It is, therefore, good science to figure out how to increase the coverage 
of these difficult-to-reach subpopulations in national surveys based on a coherent body 
of scholarship. 

If language-minority populations are excluded from surveys, they may end up being 
excluded from receiving government services they need. In an age of limited resources, 
policymakers often use data from surveys to assign priorities for funding programs and 
activities (OMH, 1999). Policymakers may therefore overlook the needs of language- 
minority populations, not for lack of interest, but because they lack the data they need 
to recognize the level and extent of unmet needs in these populations. 

The specific design features of any social survey are primarily determined by the 
questions researchers seek to answer. These questions often involve specific populations 
and the larger policy context that helps to define the objectives of the agency or agencies 
that have commissioned the research. The survey design includes the definition of survey 
variables, instrument formulation, methodologies governing data collection in the field, 
subsequent data processing and analysis, and a sampling plan that will select respondents 
who can provide data that accurately reflect the characteristics of the targeted population. 
Practical considerations also shape every survey design. In particular, the characteristics of 
the target population affect which field methodologies are used — for example, translated 
instruments and proxies or bilingual interviewers (or both) — as well as sampling techniques. 
The problems encountered in working out the details of field procedures influence and 
sometimes change survey objectives (Kish, 1965b). 



Difficulties in Assessing Language Usage 

Cost constraints dictate that most surveys rely on respondents’ self-assessments of linguistic 
proficiency. The objective validity of such self-evaluation is open to question, and evaluative 
categories are usually not explicitly defined. For example, respondents can interpret the 
categories of English proficiency as seen on the Census questionnaire (“very well,” “well,” 
“not well,” and “not at all”) differently since self-reported assessments of proficiency are 
by nature subjective. 

It is difficult to precisely identify and classify the languages that survey respondents 
speak. With any language, dialects vary and usage changes, and these changes often are 
highly localized. For example, French would appear to be an unambiguous language category, 
especially because of the high degree of standardization that has been imposed on Parisian 
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French over the course of several centuries. Yet, in the United States, French includes 
the Cajun dialect spoken by native-born individuals, most of whom reside in Louisiana, 
an array of Creoles spoken by immigrants from several Caribbean countries, and the 
dialects spoken by immigrants from Francophone Africa and from Canada. Thus, the 
French-speaking subpopulation, which constitutes approximately 3 percent of the language- 
minority population and about 6 percent of the elderly language-minority population, 
is actually splintered into numerous subgroups whose distinctive non-language as well 
as linguistic characteristics complicate researchers’ attempts to compile fully representa- 
tive data. The same heterogeneity, differing only in degree, is observable in every other 
language-minority subpopulation. Thus, our data on language-minority subpopulations 
do not reflect the actual complexity faced by researchers attempting to design surveys 
that include these groups. 

Nonetheless, data gathered from the Census, presently the best available source 
of information on the linguistic demography of the United States, give researchers an 
estimate of the nature and overall scale of the practical problems they can anticipate as 
they plan to include language-minority subpopulations in large-scale surveys of the general 
population. Furthermore, an array of valid, well-tested probability-based sampling techniques 
has been developed over the past thirty years and successfully used to gather data on other 
minority populations; these techniques may be creatively utilized to gather data describing 
small subpopulations defined by language characteristics (Santos, 1996). 



The Multiplicity of Languages 

The diversity of languages among non-English speakers makes it difficult to include the 
language-minority population in surveys and national studies. Data from the 1990 Census 
su ggest that reaching 80 percent of the 6.7 million persons aged 3 and above who did not 
speak English or who did not speak English well would require including Spanish, Chinese, 
Korean, French and Italian. Reaching 90 percent coverage would require the use of seven 
additional languages: Vietnamese, German, Polish, Portuguese, Japanese, Russian, and 
Tagalog (Table l). 6 Most data-gathering organizations cannot afford to include 90 per- 
cent of the language-minority population because of the extremely expensive processes 
of translating instruments and hiring/training bilingual interviewers. 

Relatively low coverage rates for at least some language-minority populations are 
inevitable in any national survey effort because it is virtually impossible to overcome all 
language barriers. Even the Census Bureau, with a budget exceeding four billion dollars 
for the 2000 Census, limited costs by translating the questionnaire into only five languages: 
Spanish, Chinese, Korean, Vietnamese, and Tagalog. The Bureau did, however, make 
other efforts to reach small language-minority populations (see Appendix A). Reducing 
non-coverage to scientifically acceptable levels can only be accomplished by carefully 
coordinating decisions about instrument translation and the use of bilingual interviewers 




6 To reach 80 percent of the elderly population who did not speak English well or at all would require the 
use of nine languages (Spanish, Chinese, Italian, French, Korean, Russian, Polish, Tagalog, and Portuguese). 
Reaching 90 percent would require the use of eight additional languages: German, Japanese, Vietnamese, 
Ilocano, Armenian, Greek, Hindi, Yiddish (special tabulation by J. McNally of the Census 1/1000 PUMS). 
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with information about the characteristics of targeted segments of the language-minority 
population. Less expensive, but also less satisfactory, methods such as the judicious use of 
proxies or other household members who speak English and can act as translators may 
also be exploited when language barriers cannot be otherwise overcome. 

The highly diverse array of languages spoken by language-minority subpopulations, 
and their relative distributions through different strata of the overall population, result from 
the interaction of many factors: the presence of indigenous language-minority groups; the 
continued use of non-English languages learned during childhood among both foreign- and 
native-born; intergenerational transfer and maintenance of language, primarily among first- 
and second-generation immigrants; and immigration. Clustering of non-English speaking 
residents in small enclaves may make it unnecessary for some language-minority individuals 
to ever learn English. The changes in patterns of immigration over the past century are the 
single most important factor shaping the differential patterns of language distribution 
apparent in today’s language-minority population. 

Additional Factors Contributing to Underrepresentation in National Studies 

The acquisition of accurate, reliable data from members of small subpopulations is unusually 
difficult and costly for two other reasons: they often are not geographically concentrated, 
and elaborate screening processes are required in order to obtain samples large enough to 
yield statistically valid data. The National Center for Health Statistics (NCHS) has reported 
that in order to achieve an analytically meaningful oversample of Asians and Pacific Islanders 
(who constitute approximately 3 percent of the total U.S. population) in the National Health 
Interview Survey, an additional 13,000 screenings would have to be performed to identify 
2,580 eligible households. To achieve the same numerical oversampling of American Indians 
(0.8 percent of the total population), an additional 158,000 screenings would be required. 
Cost estimates for attempting such oversamples begin at a minimum of 1.5 million dollars 
annually (NCHS, 1999). According to 1990 Census figures, no language-minority subpopu- 
lation except Spanish speakers exceeds 1 percent of the general population. Conducting general 
screenings for even a highly limited number of non-Spanish-speaking language-minority 
subpopulations would be extremely expensive. In addition, computational costs rise when 
the number of stratifying subcategories used for data analysis increases, as is the case when 
multiple language groups are identified as analytically significant. 



Sampling, Measuring and Interviewing Language-Minority Populations 

Sampling Procedures 

There are a number of cost-effective sampling techniques that can be used to yield valid, 
accurate data on language-minority subpopulations (Kish, 1965a; Santos, 1996). One 
simple method is to use pre-existing special lists to establish a selection frame for the 
targeted population. Researchers must carefully evaluate the coverage properties of such 
lists in order to avoid introducing bias. While most lists generated by commercial enterprises 
cannot be used to produce unbiased samples, some pre-existing lists compiled by government 
agencies have excellent coverage properties. For example, school rosters provide excellent 



coverage of school-aged children on the local level; the records of the Immigration and 
Naturalization Service provide complete listings of legal immigrants; the Centers for 
Medicare and Medicaid Services (formerly Health Care Financing Administration) has 
formulated a list of Medicare beneficiaries that includes more than 95 percent of all U.S. 
residents aged 65 and above. Using telephone interviews to conduct preliminary screen- 
ings can also help contain costs. However, surveys targeting minority populations must 
supplement telephone screenings with face-to-face screening (a dual-frame technique) 
in order to attain adequate coverage, because telephone ownership is less pervasive 
among poor and minority households than in the general population as a whole. 

Cumulation is another technique for developing frames that takes advantage of 
already-accomplished research, and is one of the least expensive methods by which an 
extensive list of individuals belonging to small subpopulations can be compiled quickly. 
Researchers identify a relatively large pool of potential respondents by reviewing data 
gathered over the course of several years through previously conducted large-scale sur- 
veys such as the General Social Survey (GSS), which began in 1977. While national 
surveys like the GSS usually employ probability-based, multi-stage area household 
sampling strategies that result in the inclusion of only a few respondents belonging 
to small sub-populations during any single episode, researchers can combine data from 
several sequential surveys to generate a fairly large number of potential respondents. 
Cumulation may have significant drawbacks, however. The information derived may 
be outdated, lists drawn from data gathered over the course of several years may not 
adequately reflect recent trends such as changes in immigration patterns, and incomplete 
coding of respondents’ language characteristics can preclude identification of language- 
minority speakers. 

Perhaps the most flexible method for increasing the representation of small sub- 
populations in national surveys is the use of supplementary sampling techniques. Two basic 
strategies can be pursued: supplements may be integrated directly into national-scale surveys; 
or independent surveys can be carried out on the subnational level concurrently with national 
studies. A more sophisticated strategy for integrating supplementary samples into national- 
level surveys involves the use of multi-stage stratification techniques (Santos, 1991; 1996). 
First-stage units of a general population sampling frame are supplemented with minority- 
based Primary Sampling Units (PSUs) from which Secondary Sampling Units (SSUs) are 
derived through stratification by differential levels of minority concentration. SSUs in strata 
characterized by higher minority concentrations are then oversampled. This form of supple- 
mentary sampling has been a very efficient means of increasing coverage of small populations 
that cluster in specific areas. The NCHS has used supplementary sampling to target small 
subpopulations to create the Defined-Population Health and Nutrition Examination Survey 
(DP-HANES), which is conducted simultaneously with NHANES. The first supplement 
targeted Hispanic subpopulations (H-HANES) and proved extremely successful in yielding 
heretofore unobtainable data. 

However, conducting supplementary regional or state-based surveys is expensive; in 
effect, two separate surveys must be financed and fielded simultaneously. Over the long term, 




however, the strategy can be economical if the basic design can be applied to a large number 
of small subpopulations on a rotating basis over the course of several years. For example, 
the first years supplementary survey might focus on Spanish-speaking subpopulations; the 
succeeding years on Chinese speakers; the third year’s on groups speaking French Creole 
(of which there are several), with a return after a specified interval to the initially targeted 
subpopulation. In the case of language-minority subpopulations, rotating supplements of 
language-minority participants may provide data on the complexity of language-minority 
subpopulations that may not be possible with any other sampling methodology. Data 
gathered from a number of small-scale surveys carried out on the regional or state level 
can also be aggregated in order to build a national picture. 

The qualitative information on small populations from local supplementary surveys may 
justify their cost. Compared with the Federal government and large national survey contractors, 
local data-gathering organizations (sometimes called “boutique” firms) may have better access 
to and knowledge of the special subpopulations located in their own territory. Such firms have 
often forged long-standing partnerships with local organizations that act as the “gate-keepers” 
of their communities; the imprimatur of these trusted neighborhood organizations often 
increases individual respondents’ willingness to participate in social research initiatives. 

When the targeted population is defined by language, boutique firms typically know the 
local vernacular, which helps increase communication, trust, and cooperation between 
researchers, community leaders, and potential respondents. 

Survey Instrument Issues 

To collect data on language-minority populations, there are a limited number of options 
available to researchers. They can translate existing research instruments and ancillary 
documents (e.g., advance letters explaining the purpose of the research), or create new 
instruments in the languages of the groups to be included. There are clearly advantages 
and disadvantages to both approaches. Literal verbatim translations are often inadequate, 
and should be back-translated (i.e., from the second language back into English) to verify 
linguistic accuracy, although translation efforts should not stop there. Both versions must 
ask the same or equivalent questions and thereby gather equivalent data. Cognitive 
equivalence of the concepts being investigated is crucial; the research team should con- 
sider consulting with anthropologists, linguists, psychologists, ethnographers, historians, 
and experts in religion. 

If cultural differences are not taken into account when survey instruments are 
translated, comparisons across subpopulations from different cultures may be seriously 
compromised (see Johnson et al., 1996). For example, a recent New Zealand study casts 
doubt on the cross-cultural validity of the European-designed SF-36, an international 
survey instrument measuring perceptions of health-related quality of life. This study 
compared responses of New Zealanders of European descent, Maori who had assimilated 
into European culture, and Pacific Islanders who had not assimilated (Scott et al., 2000). 

The researchers found that the first two groups gave comparable responses to items in the 
questionnaire, presumably because they shared certain basic European cultural assumptions 
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about the relationship between mental and physical health. The responses given by Pacific 
Islanders, however, were not comparable to those of the other groups because their assump- 
tions about the mind-body relationship differed radically. A similar difference was found 
between Asian Indian and Pacific Islander elderly in Fiji regarding Western interpretations 
of health (Panapasa and McNally, 1997). 

The cultural differences that researchers must take into account involve not only 
varying concepts of health, well-being, and the nature of the self, but also differing percep- 
tions regarding hierarchical relationships such as kinship or communal structures. For example, 
in a recent survey of responses to an antismoking campaign, Mexican-American men found 
appeals to family responsibility more compelling than arguments based on the importance of 
preserving ones own health. Some concepts that are commonplace in the U.S. context cannot 
be expressed in other languages. For example, it may be impossible to collect information on 
home equity loans from Sudanese Dinka tribe members since this concept does not exist in 
their culture. Some specific terms that are easily understood or recognized in English may not 
have an equivalent term in another language. For example, a group of elderly Koreans, long- 
term U.S. residents with relatively low levels of English proficiency, revealed during health- 
survey interviews that while they knew the English word “cholesterol,” they were unfamiliar 
with the equivalent Korean term (Hendershot et al., 1996). 

Interviewer Expertise 

No matter how well survey instruments are designed, there almost always remains the need 
for bilingual or multilingual interviewers. Linguistic fluency in a language alone is not enough 
to ensure competent data collection; the educational levels, language abilities, values and beliefs 
of potential respondents must also be considered. Interviewers must be sensitive to cultural 
differences both among and within language-minority subpopulations, and should have 
sufficient linguistic skills to tailor their own language appropriately. This level of linguistic 
proficiency and adaptability is most often found in native speakers. 

The relationship between the interviewer and the respondent can affect the quality 
of the data collected. Successful interviewers develop an atmosphere of trust and mutual 
support. Lack of a common culture or of cultural understanding and a common world- 
view can hinder the development of a fruitful relationship. 

One alternative to using bilingual interviewers is using third-party interpreters. 

This allows researchers to collect information from non-English speakers when no better 
alternatives are available, but there are several problems with this practice. Use of a third- 
party interpreter hinders the development of the interviewer-respondent relationship. 

The presence of third-party interpreters may constrain respondents from responding 
candidly, especially when sensitive topics are addressed. Third-party interpreters may 
interpose their own judgments and point of view, in either framing the question or trans- 
lating the response. Also, the use of untrained third-party interpreters increases the risk of 
violating respondents confidentiality. Hence it is essential that third-party interpreters be 
trained about the importance of respondents’ right to privacy and the confidentiality of 
the information provided. 
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Problems of Within-Group Heterogeneity 

In both translating and creating equivalent research instruments and in selecting and training 
interviewers, researchers must consider the heterogeneity within language groups. Language- 
minority subpopulations differ linguistically, demographically, and culturally. In all languages, 
accepted forms of usage are evolving constantly, and such changes are often highly localized. 
The Korean or Russian spoken by individuals who immigrated 25 years ago, for example, 
differs from that of more recent immigrants, who may perceive long-term migrants as speak- 
ing an archaic form of their native language. Both groups’ language will be affected by their 
exposure to English, but the length of time they have been interacting within a smaller com- 
munity of speakers of that language will result in different dialects or patterns of speech. 

Colloquialisms used by younger members of language-minority subpopulations 
are usually highly localized and often represent appropriations of English-language slang; 
these expressions may only be understood within the specific community, often an inner- 
city neighborhood. Bilingual interviewers must be skilled and flexible enough to detect 
and accommodate these highly particularized linguistic patterns. 

Perceptions of social appropriateness differ widely among ethnic groups and may 
affect responses to surveys. What is considered polite or acceptable by one subpopulation 
may be viewed as offensive, alienating, or inappropriate by another. A follow-up to a survey 
of elderly non-English-speaking Koreans residing in the United States who had participated 
in ACASI (Audio Computer Assisted Self Interview) interviews revealed that the respondents 
thought the female voice used in the computerized recording sounded too young. While the 
survey designers had carefully chosen this voice to be as pleasing as possible, the respondent 
reaction was affected by the respect their culture accords the elderly (Hendershot et al., 1996). 
Linguistic usage and cultural attitudes of various language-minority subpopulations may 
vary not only according to age and educational level, but also by class, regional origin, 
and ethnicity. Standard, grammatically correct usage, an academic vocabulary, and even 
a particular regional dialect can significantly hinder communication with subpopulations 
characterized by relatively low levels of education or who identify strongly with their 
region of origin, class, or ethnic group. Bilingual interviewers who lack professional train- 
ing or do not possess sufficient linguistic flexibility may unwittingly activate stratification 
structures if their accents, dialectal patterns, or behaviors betray regional or class origins 
and attitudes different from those of respondents; potential respondents may develop a 
sense of alienation or distrust. 

These regional dialect and class issues are community-specific and cannot be predicted 
without local information. This highlights the importance of establishing partnerships with 
local community associations, such as advocacy groups, charitable and religious organizations, 
civic groups, school boards, and local government task forces. Members of such organizations 
are often willing to help design research, review translated materials for accuracy and cultural 
appropriateness, and assist in developing strategies for locating and establishing rapport with 
potential respondents. Such input from local community leaders is invaluable as a sign of 
endorsement, lending legitimacy to the research effort, and can be crucial to the success of 



data-collection efforts. These organizations can also provide a means of sharing research 
findings with the community upon completion of the study (Mays, 1999). 

Strategies for Exerting Quality Control Over Translation and Interview Practices 

In light of the numerous linguistic and cultural variables that affect data collection, rigorous 
quality control over translation and interview practices is crucial to achieve cultural appro- 
priateness, accuracy, and sufficient precision, thereby ensuring the scientific integrity of the 
information gathered. Standardized protocols for translating survey instruments and for 
bilingual interviewing do not exist. Instrument translation and subsequent cognitive test- 
ing, as well as the recruitment and training of bilingual interviewers, often receive limited 
attention in most major social research initiatives, usually due to time and money constraints. 

In the United States, survey instruments and related documents are almost always 
developed in English first, even when translation into at least one other language (most 
often Spanish) is planned. Once the translation is completed, back-translation is often used 
as a verification process. Cognitive testing for equivalence is time-consuming and therefore 
often not done. While back-translation can provide confirmation of the literal accuracy of 
a translation, it is inadequate for evaluating cognitive equivalence or cultural appropriateness 
(personal communication, Kelly Jones Dresden, July 2000). 

Some smaller data-gathering organizations have begun experimenting with concurrent 
instrument development in English and a second language. To ensure that the data gathered 
in the two languages are comparable, these simultaneously developed instruments are rigor- 
ously tested for cognitive equivalency, often using monolingual focus groups in each language. 
This dual-focus approach is appealing because it avoids the problem of timing. Frequently, 
data collection in English must be initiated while the instrument is being translated into 
the second language. This practice precludes changes to the original instrument in order to 
achieve cognitive equivalence, or makes such changes more costly. 

Because of the semantic precision required in developing cognitively equivalent instru- 
ments, it is important to use professionally trained translators who understand the purpose of 
the research project and the meaning of the instrument items. Similarly, the recruitment and 
training of bilingual interviewers is an important consideration. To ensure that interviewers 
possess adequate fluency in the target language(s) and the ability to employ various levels of 
usage to accommodate the linguistic characteristics of potential respondents, native speakers 
or persons with native-like proficiency should be used. Interviewers’ linguistic competency 
should be objectively demonstrated, e.g., through standardized assessment of reading, conversa- 
tional fluency, and listening comprehension. The quality of an interviewer’s voice can also be 
assessed objectively: professionally trained, pleasing voices characterized by a relatively neutral 
accent are generally most effective. However, as noted earlier, what constitutes a “pleasing” 
voice can be largely culturally determined. Hiring interviewers from the communities in 
which targeted respondents reside can minimize cultural communication barriers; however, 
their use can also evoke deeply held culturally determined beliefs or customs and raise issues 
about confidentiality of responses that may influence the respondents’ willingness to provide 
certain types of information, such as details of personal health or intimate relationships. 
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Because of the complexity of the relationship between interviewer and respondent, 
training interviewers will always be a challenging part of data collection. Similarly, instrument 
translation is unlikely to ever be entirely free of ambiguities or to be perfectly culturally appro- 
priate. Tensions exist between the need for literal accuracy and the need for seman tic/ cognitive 
equivalence or comparability. The implicit contradiction between these goals is especially 
apparent in translations of precisely worded survey instruments that are required to meet 
the highest standards of scientific accuracy. 



Technological Innovation and Linguistic Logistics 

The technological aids available for interviewing represent a series of remarkable advances 
over what was available 25 years ago: laptops, cell phones, Computer Assisted Telephone 
Interviewing (CATI), Computer Assisted Personal Interviewing (CAPI), and Audio 
Computer Assisted Self Interviews (ACASI), which obviate the need for literacy. The 
Internet has the potential to add even greater flexibility for enhanced communications 
to support interviewing, and should reduce associated travel costs. 

These, however, are primarily communications media. Since the early 1990s, a new 
generation of technology has emerged for translation. This technology has been developed 
largely through Department of Defense initiatives, based on needs for linguistic training 
and translation/interpreting services for multiple purposes. While a new level of sophisti- 
cation exists in these technologies, there are limitations to their widespread use in national 
surveys and research studies. Nonetheless, they show promise for increased efficiency and 
lower costs as they are developed. The relative scarcity of professional bilingual translators 
and interviewers who also understand survey methodology increases the urgency and impor- 
tance of developing such technologies, and argues strongly for the consideration of ways to 
combine methods. For example, the use of machine technology as a first “rough cut” for 
translation can reduce the amount of time required for professional translators, allowing 
them to refine and correct rather than carrying the entire translation burden. Beginning 
with an existing instrument in the target language and having a professional translator 
with an in-depth cultural understanding of the target population modify that instrument 
to make it culturally appropriate can also save time, effort and money, although clearly 
the appropriateness of the instrument for the specific purpose of the study must also be 
carefully evaluated. 

Innovations that have emerged in the area of machine translation have been driven by 
markets both inside and outside the United States. The focus has been on languages needed 
to reach large groups of speakers, such as Spanish, French and Chinese. There clearly is less 
commercial demand for translation to what can be called “minority languages,” that is, those 
with fewer speakers. In addition, technological translation tools require significant mainte- 
nance to be worthwhile. For these reasons, machine translation is not a quick and easy 
answer to the challenge of including language-minorities in national surveys and studies. 
However, there are applications in current use and others in development that may be helpful. 

The goals of any machine translation program are to be of general purpose (able 
to translate any text), of high quality (matching human translation), and fully automatic 
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(requiring no user intervention). Existing machine translation applications can meet any 
two of these goals but not all three at once. There are three major types of machine translation 
applications currently in use: knowledge-based, corpus- or example-based, and “human-in- 
the-loop” or efficiency tools. Each offers advantages and disadvantages. Knowledge-based 
systems use detailed knowledge of the language (grammar and other rules) to create high 
quality translations, but require an extensive development effort. It takes at least a year to 
develop such a system for a new language pair. Corpus-based machine translation systems 
translate by matching text in large databases of parallel text (similar to a technique called 
“translation memory”). More generalized example-based systems tag words in parallel text, 
and then translate sentences and phrases that have never been seen before based on matching 
words and phrases with common tags. Human-in-the-loop tools, as exemplified below, typi- 
cally are used in conjunction with knowledge-based and/or corpus-based systems to 
enhance efficiency. 

Corpus-based systems are especially useful for dissemination purposes; they have one 
source language, are restricted to a controlled style, and address a single topic or domain. 
Because corpus-based systems are so tightly controlled and developed for special purposes, 
they offer a full semantic analysis. For some basic communication purposes, lower quality 
translations may perform adequately, but have real-time requirements. They can, however, 
be developed as soon as a parallel corpus of examples is available. 

There are some examples of machine translation applications in use or being piloted 
that have been developed for the Department of Defense. One, DIPLOMAT, is a rapid- 
deployment, wearable, speech-to-speech translation device that was developed for English 
and Croatian, Haitian Creole, Spanish, and partially developed for Korean. DIPLOMAT 
combines corpus-based and knowledge-based approaches, plus a morphological analyzer 
and a user-interface. The combination of methods or machine translation “engines” allows 
developers to combine strengths and avoid weaknesses of the individual approaches, and 
uses a statistical language modeler to select the best combination of outputs. Using multi- 
engine machine translation, an application for a new language can be developed within 
weeks; however, the new application undergoes improvement for months or years. 

TONGUES and NICE are other examples of the use of these combined approaches. 
TONGUES is an audio-voice translation guide, a hand-held speech-to-speech system. 

The example-based system combines a word-for-word dictionary translation, a glossary 
database for phrasal translation, and both general and domain-specific databases of sen- 
tences. It is being developed initially for humanitarian aid and applications other than 
war (e.g., dealing with civilian leaders), and the prototype is currently being pilot tested 
in Croatian and English by U.S. Army chaplains. Translations produced by a system such 
as TONGUES are generally of lower quality than those produced by extensive knowledge- 
based system, but can be developed more quickly. Examples of domain-specific sentences, 
phrases, and words are collected to form the corpus that serves as training data for speech 
recognition and speech synthesis for this system. Because it is intended for domain-specific 
conversation, TONGUES assumes that the interviewer and respondent are face to face, 
and the system uses human feedback to clarify meanings. While it would not be appropriate 
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for telephone survey administration, TONGUES could be useful in door-to-door interview 
administration for some surveys and research studies. Some potential disadvantages in survey 
use are that it may feel unnatural to respondents, there is a small delay for processing time, 
and general purpose speech recognition suffers from low audio quality such as that through 
a telephone. While TONGUES would still require human translators to produce the corpus 
specific to the project (i.e., to translate the survey), it could significantly speed the translation 
of survey materials. 

NICE, or Native-language Interpretation and Communication Environment, uses 
multi-engine machine translation to enable speakers of electronically underrepresented 
languages to participate in the information age. Through NICE, it is hoped that policy 
makers will be able to access ideas, viewpoints, and information from developing nations. 

In addition, it can provide assistance for unforeseen translation needs, such as humanitarian 
aid requirements, and can be used in the documentation and preservation of endangered 
languages. Part of a larger program of Western Hemisphere collaboration, NICE currently 
includes Spanish and two indigenous languages of Latin America. While still under develop- 
ment, NICE offers promise for use in languages with smaller numbers of speakers where 
professional bilingual translators and interpreters are difficult to find. 

Translingual Information Detection, Extraction, Summarization (TIDES) is being 
developed for the Department of Defense in response to the demand for an “electronic 
linguist.” It is designed to support monolingual information analysts by automatically pro- 
cessing more than a billion information sources daily, including text, audio and web-based 
information in various languages. There are not enough linguists available to manage the 
huge volume of information, especially in minority languages. This application is intended 
to enhance international operations and increase the military’s ability to respond rapidly to 
crises, including humanitarian assistance, disaster relief, and consequence management. 

Summary 

The increasing number of individuals in the United States who do not speak English well 
represents a major challenge for health and social service agencies, educators, policy planners, 
and the social science research community. Although only about three percent of the U.S. 
population aged 5 and over speak English poorly or not at all, the proportion is substantially 
larger for specific population subgroups. Demographers and other social scientists usually use 
large-scale household surveys, based on probability sampling, to collect data that accurately 
represent the characteristics of the U.S. population as a whole. Most surveys limit their inter- 
viewing to English or English and Spanish, and respondents must have a relatively high level 
of proficiency in that language. If the proportion of language-minority individuals in the 
population increases, the representativeness of national samples is increasingly compromised. 
Excluding non-English speakers omits many of the most vulnerable in our population. Includ- 
ing respondents who do not speak English, or who have low levels of English proficiency, 
is costly due to the need for extensive screening procedures, instrument translation, and 
the use of fully trained, culturally competent bilingual interviewers. 
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Important scientific knowledge can be gained from better representation of language- 
minority subpopulations, which will prove crucial to the Presidentially mandated initiatives 
aimed at eliminating health disparities in minority populations as part of Healthy People 
2010, currently being pursued by the DHHS and the National Institutes of Health (NIH). 
Inclusion of language-minority speakers in large-scale statistical studies of the U.S. popu- 
lation is a natural complement of trans-NIH efforts to encourage the greater participation 
of members of minority groups in clinical trials and other aspects of medical research. 

A panel of experts pointed out that the challenges of including language-minority 
populations in national surveys and studies are not new and that many resources are already 
at hand. In addition, there are many new technologies and potential solutions on the horizon. 
However, in view of strong national commitments to (1) improving the inclusion of minorities 
in clinical trials; (2) reducing health disparities among subpopulations; and (3) developing 
cultural competence in health service delivery, 7 researchers and policy makers should give 
added attention to language as a potential barrier for inclusion in national surveys, as well 
as for access to health care and social services. 

Barriers to Inclusion 

A recurring theme throughout the workshop and this report is that cost is the most 
significant barrier to the inclusion of language-minority populations in national studies. 
But researchers and policy makers must also consider the costs — in terms of data validity 
and sample bias — of not including these subpopulations. Those omitted constitute, in 
many instances, not simply a parallel group that differs linguistically and culturally. Rather, 
the excluded often represent segments of the U.S. population that are less educated, of lower 
socioeconomic status, and more vulnerable along a number of social and health dimensions, 
and for all these reasons in greatest need of services whose provision may be based on the 
data collection in question. 

Four necessary, but expensive, tasks were identified: (1) sampling to get sufficient num- 
bers of subjects who do not speak English well; (2) translating or developing survey instruments 
(including the concomitant costs of vetting the translation, conducting focus groups, and/or 
piloting surveys); (3) recruiting, hiring, and training bilingual interviewers, and (4) contact- 
ing and interviewing subjects who live in rural or geographically diverse locations. And given 
the time-consuming nature of tasks of (2) and (3), time itself also becomes a barrier. 

The geographic distribution of minority language populations may be a significant 
barrier. Language-minority individuals are often difficult to include in studies either because 
they are clustered in small, possibly remote areas, or because they are not concentrated in any 
particular area. Cost-effective sampling strategies based on geographic location therefore often 
cannot be used. 

Language change over time is a barrier to inclusion of language-minority groups in 
research. All languages change over time; the version of language spoken by recent immi- 
grants is likely to differ significantly from that of individuals who immigrated several years 

7 For a definition of cultural competence, see the Office of Minority Health website http://www.omhrc.gov 
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ago. Groups living in relatively isolated communities with little contact with their country 
of origin are likely to have developed different dialects from those in more urban areas, 
even if both groups immigrated at the same time. 

Lack of coherence with other research goals presents a barrier. The issue of address- 
ing specific language groups may not be well-integrated into a projects major research focus, 
and may therefore seem an ad hoc, add-on component that does not fit well with the overall 
research goals and design. 

Use of community members as translator/interpreters may be a barrier. While the 
use of local translators and interpreters can sometimes improve the quality of survey data, 
their use also can be a barrier with regard to issues of confidentiality and/or culturally sensi- 
tive topics that respondents are uncomfortable with or reluctant to openly discuss with some- 
one from their own community. Similarly, someone from the local community (either the 
current community or the community of origin of an immigrant) may invoke the class 
structure of the culture of origin, which can interfere with the goals of the research. 



Enabling Inclusion 

In spite of the barriers mentioned above, it is important to find ways to allow surveys and 
research studies to capture the increasing linguistic diversity of the United States and hence 
be truly nationally representative. While not all studies can achieve this, there are some 
current practices that offer useful approaches that should be considered. 

It is possible to decrease cost through innovative sampling approaches, rather than 
screening the general population. For example, researchers can identify subjects through 
pre-existing lists based on administrative records (e.g., birth registries, INS records, 
Medicare records). Other potential savings may ensue from judiciously employing 
commercially compiled lists, using telephone interviews to conduct preliminary screen- 
ings, and cumulating data from repeated surveys in order to increase sample sizes. 

Instrument translation should incorporate and expand on several important practices. 
Translation should be done by professional bilingual translators, and the translations should 
be vetted (judged as to linguistic and cultural appropriateness) by monolingual speakers of 
the target language. Translated or parallel instruments should undergo cognitive testing to 
determine that in fact they test/query the same concepts. Researchers should allow flexibility 
for inclusion of Anglicized dialects. The retention/inclusion of English terms in the translated 
instrument is important for cases when a concept may not exist in the target language and 
culture. Translations should also be tested in focus groups of monolingual speakers from 
or typical of the target research group, and should be piloted whenever possible. 

Researchers should build in time for translations when designing and planning 
studies. Time should be allowed to complete the English version of an instrument before 
beginning the translation, and time to translate, evaluate, and test the translated version 
prior to the initiation of actual data collection in either language. AJternatively, researchers 
could develop (or contract development of) a parallel, culturally appropriate instrument 
simultaneously with the English language instrument, or lagged behind the English version 
but overlapping in timing. 
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The rapidly expanding sophistication of machine technology can reduce the amount of 
time required for professional translators by allowing them to refine and correct rather than 
shoulder the entire translation burden. Although not applicable in all cases, some research 
should benefit from using one or more of the three major types of machine translation 
currently in use — knowledge- based, corpus-based, and human-in-the-loop. 

In order to complement and inform future activities, researchers should ensure that 
they make optimal use of existing knowledge by building on the work of others and 
collaborating across disciplines. Researchers should: 

> Gather and share the experience of international organizations that already 
have multilingual survey experience (e.g., the United Nations, Organisation 
for Economic Co-operation and Development, World Bank, Demographic 
Health Surveys, World Health Organization). 

> Archive translations and source texts to share and to combine with those 
of colleagues for potential use in machine translation memory databases. 

> Use existing surveys as a starting point whenever feasible. For example, 
a survey from another country, already written in the language of that 
country, might require refinements to accommodate cultural adaptations 
that have taken place since a group emigrated, but could provide at least a 
basis to build on. 
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Appendix A. Recent and Current-Practice Examples 



Although there are no standardized protocols for translation and bilingual interviewing, 
many national surveys have developed methods to accommodate respondents who speak 
languages other than English. These methods have been influenced by advances in the 
growing body of scholarship devoted to the effects of the interviewer/ respondent relation- 
ship on data quality, to theories of translation, and to issues of cross-cultural validity. 

There are several examples of practices from which we can learn and on which to base 
some recommendations for inclusion of language-minority groups in national studies. 

The first is an historical example, that of refugee surveys performed under contract to 
the U.S. Government in the 1970s and 1980s. The others are current large-scale national 
data collection efforts that have grappled with the problems of developing cost-effective, 
practical, state-of-the-art approaches to formulating linguistically and culturally appro- 
priate methodologies for communicating with language-minority speakers: the decennial 
census, the New Immigrant Survey (NIS), and the Early Childhood Longitudinal Survey 
of a Birth Cohort (ECLS-B). For information on the treatment of language-minority popu- 
lations in other ongoing national surveys, see McNally (2000). 

Refugee Surveys in the 1970 - 80 s 

After the 1975 influx of Southeast Asian refugees to the United States, the Government 
undertook periodic national surveys to assess how well the refugees were adapting. In the 
1980s the surveys were changed to annual surveys of respondents who had been in the 
country five years or less. These surveys, which represent 25 years of data with a reasonable 
measure of consistency over time, yield important lessons on the inclusion of linguistic 
minorities in survey research. Refugees often represent an influx of a new language minority 
group, rather than an accretion of a group already present in the country. This means that 
researchers must face the challenge of finding bilingual or multilingual translators and inter- 
preters. For the early surveys, there was an apparently unresolvable tension among the three 
goals of the surveys: to adequately represent the people, to standardize the methods and 
context to allow for comparison across populations and regions and through time, and 
to understand who these people were, how they were adapting, and why there might 
be difficulties. 

The early refugee surveys demonstrated that national origin was not the same as 
ethnicity, and that the assumption that there was a “national” language was often erroneous. 
Hence, national origin was abandoned in favor of a five-group ethnicity model in which 
ethnicity was coterminous with language: Lao, Hmong, Khmer, Vietnamese and Chinese 
(limited to Chinese from Viet Nam). Today this five-group approach remains the standard 
convention, and provides a categorical scheme that is practical and defensible. However, 
this model does not address two fundamental problems: how to deal with smaller 
populations and how to deal with class, regional, and religious variations with the 
larger populations. 
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The 2000 Census 



While not a survey, the United States Census represents the baseline source for the most 
comprehensive information on non-English speaking individuals in the United States. 

As part of its mandate, the Census Bureau attempts every 10 years to obtain information 
on all individuals residing in the United States. They are the only organization that makes 
a concerted effort to obtain demographic and socioeconomic data regardless of the 
language employed by the respondent. 

Population and Bureau staff characteristics, as well as time and money, weighed heavily 
in the decision about the number of languages into which the Census questionnaire should 
be translated. In 2000, households receiving a census form in the mail had the option of 
requesting a questionnaire in one of six languages: Chinese, English, Korean, Spanish, 
Tagalog, and Vietnamese. Approximately 2 million questionnaires were requested in these 
six languages. Preliminary estimates suggest that the response rate for each group ranged 
from 30-46 percent. 

For the 2000 Census, the Bureau initiated a program called the Census 2000 
Language Program (Martinez, 1998). This effort was designed to maximize the completion 
of census information and overcome language barriers that traditionally have limited some 
individuals/households from participating in the decennial census. Census 2000 Language 
Assistance guides were available in 43 languages beyond the six mentioned above, and were 
prepared for both the short and long census forms. The printed guides consisted of trans- 
lations of the questionnaire, multi-lingual instructions, and visual aids that an individual 
could use in order to complete the English version of the form. Fifteen million such guides 
were provided for dissemination. 

The Census used both centralized and decentralized data collection methods, including 
mailings, personal visits and telephone interviews. The Bureau used field representatives with 
fluency in specific languages, and to accommodate localized linguistic variations, the Bureau 
hired more than 300,000 bilingual interviewers from local neighborhoods throughout the 
country. The Bureau also engaged field representatives and regional office staff to identify 
interpreters from local organizations to serve as interviewers. Even with these efforts, officials 
estimate that approximately 1 percent of all non-responses during the 2000 Census resulted 
from insurmountable language barriers. 

The Early Childhood Longitudinal Study- Birth Cohort 

Language minority families present special challenges for the Early Childhood 
Longitudinal Study s Birth Cohort (ECLS-B) study. Data collection methods will include 
CAPI interviews with parents, direct assessments of children, self-administered paper question- 
naires for fathers, and CATI interviews with child care providers. About 2,000 Asian, 1,500 
Hispanic, and 1,000 American Indian births will be included in a national sample of about 
15,000 children born throughout 2001. The ECLS-B approach to language minority issues 
is to make every reasonable effort to include these families in the study, to collect data without 
compromising quality in any major way, and to be sensitive to cultural differences presented 
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by these families. At the same time, researchers are attempting to be mindful of the fixed 
resources available to the project and make the best tradeoffs they can to reach out to 
minority language families without jeopardizing the overall study design. They have 
developed specific criteria and decision rules so that the procedures for including language- 
minority families are not arbitrary and the data are collected in a standardized manner. 

The approach is still under development, and is being tested in the field during 2000 and 
2001. Spanish speakers are by far the largest language minority group in the U.S. Spanish- 
speaking field interviewers will collect data from families who prefer to speak Spanish. 

There are Spanish versions of all data collection instruments and materials. For the most 
common Asian languages spoken in U.S. households with young children (Mandarin, 
Japanese, Korean, Vietnamese, Thai, Cambodian, and Hmong), outreach materials and 
telephone assistance are available. A Chinese version of most of the study instruments has 
been tested and is currently under review. 

Field staff will be recruited from local areas where the sampled families reside. The 
staff includes interviewers who speak several Asian and Native American languages; however, 
the cost of supporting travel for staff throughout the country to match all the sampled house- 
holds that prefer to speak those languages in the interview would be prohibitive. Data will be 
collected from those households primarily with the aid of interpreters. Moreover, some assis- 
tance will be provided by central office telephone interviewers who speak these languages. 
Most of the ECLS-B direct child assessment measures are not very sensitive to language at 
the 9-month data collection point. For the 18-month data collection round, a version of the 
Massey attachment sort that has been translated into Spanish, Chinese, and Japanese is being 
tested. In both of these first two rounds of data collection, teaching interactions between 
parent and child will be videotaped. Spanish-speaking coders in the central office will code 
tapes from Hispanic households; other languages spoken on the tapes will be translated into 
English and transcribed for coding. Although much of the focus in developing the ECLS-B 
language minority protocol has been on the first two data collection points, their general 
approach incorporates a longitudinal perspective from which they address issues that are 
likely to occur over the course of the 6 waves of data collection, ending when the children 
are in first grade. 

The New Immigrant Survey 

The goal of this study is to advance the understanding of the characteristics of immigrants 
and their children, and the process of immigration and its impact on the United States. Past 
immigration research faced several serious challenges because of data limitations: most data on 
immigrants are cross-sectional, so dynamic processes related to individual immigrants cannot 
be investigated; sample sizes are usually extremely small, so analysis of individual country-of- 
origin groups is not possible; data on legal status (legal versus unauthorized) and visa category 
(e.g., refugee versus employment versus family reunification) are unavailable; data on entry 
cohort and length of time since entry are often misleading; and immigrants who return to 
their home country are systematically excluded. 
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Building on 20 years of input from expert panels, researchers are fielding the New 
Immigrant Survey, which avoids these problems. The sample, based on administrative 
records from the U.S. Immigration and Naturalization Service, is representative of new 
legal immigrants. The survey is longitudinal, collects retrospective data, includes infor- 
mation on immigrants themselves, their children (both U.S.- and foreign-born) and 
other household members, and follows immigrants who leave the United States. 

The pilot for this study (NIS-P) developed new research strategies for drawing the 
sample, locating sampled immigrants, subject retention, interview languages, sensitive 
questions, and cost-effective procedures. The NIS-P was a telephone survey based on a 
representative sample of persons admitted to legal permanent residence (that is, of people 
granted a “green card”) during July and August 1996 (Jasso et al., 2000). Interview 
instruments were translated into six languages: Spanish, Chinese, Russian, Polish, Korean, 
and Vietnamese. Bilingual interviewers conducted interviews in these languages and eleven 
others. Overall, 44 percent of the interviews were conducted in English, 26 percent in 
Spanish, and the remaining 30 percent in the sixteen other languages. Item response rates 
were considered comparable to or better than similar questions on the 1990 U.S. Census. 
Importantly, the survey provides information on topics that previously could not be 
addressed due to lack of or unreliable data: immigrants’ educational levels, language skills, 
income, links between legal and illegal immigration, marriage, health, mobility, and religion. 

Lessons learned from the NIS-P are that researchers must confront several barriers to 
inclusion when they are designing and implementing surveys. An example is the geographic 
dispersion of some population subgroups. In order to ensure appropriate inclusion of all 
groups, researchers must be willing to incorporate costs and time required to (1) translate 
and pilot instruments so that they will be culturally appropriate; and (2) train and recruit 
interviewers. To ensure data quality, researchers must integrate the issue of inclusion of 
language-minority groups into the design and planning of large surveys and studies, 
rather than allowing these issues to be dealt with ad-hoc after the design phase. Novel 
approaches used in the NIS-P included preparation for dealing with multiple languages 
in advance, offering respondents a choice of their home language or English, using well 
trained interviewers, using an introductory script, and experimenting with randomly 
assigned versions of the translated instruments. The investigators felt that their approach 
maximized data quality, avoided activating stratification structures, and preserved respondents’ 
freedom to grow and change. 
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Appendix B. Inclusion of Language-Minority Populations in National Studies: 
Challenges, Opportunities, and Best Practices 

AGENDA 

July 27-28, 2000 

National Institutes of Health 
Natcher Conference Center 
Building 45, Rooms El and E2 

Sponsored by 

National Institute of Child Health and Human Development 
National Institute on Aging and 
Office of Research on Minority Health 



Thursday, July 27, 2000 

2:00 p.m. Greetings and Introduction 

Rose Li, Chief, Demography and Population Epidemiology 
Behavioral and Social Research Program, NLA 

Welcoming Remarks and Introduction of Keynote Speaker 

John Ruffin, Director, NIH Office of Research on Minority Health 

Keynote Address 

Nathan Stinson, Deputy Assistant Secretary for Minority Health 
OPHS, DHHS 

3:30 p.m. Overview: U.S. Linguistic Demography 

Gillian Stevens, Associate Professor, Department of Sociology 
University of Illinois at Champaign-Urbana 

Overview: How Major U.S. Surveys Handle Non-English-Speakers’ Participation 
James McNally, Director, NACDA, University of Michigan 

4:30 p.m. Discussion 

6:00 p.m. No-Host Group Dinner — West End Grill, Bethesda, MD 
Speaker: Thomas Perez, Civil Rights Office 
ASPE, DHHS 



Friday, July 28, 2000 

8:00 a.m. Coffee and Check-in 

8:45 a.m. Greetings and Opening Remarks 
Rebecca L. Clark, NICHD 

9:00 a.m. Welcoming Address 

Yvonne Maddox, Acting Deputy Director, NIH 
and Deputy Director, NICHD 
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9:20 a.m. Current Practices — What Works in the Field and What Doesn’t 

(How and When To Translate , Cultural Considerations , Improving Response Rate , 
Supply of Bilingual Interviewers, etc.) 

Richard Bitzer, Lead Assistant, Division Chief for Surveys, U.S. Census Bureau 

Guillermina Jasso, Co-Principal Investigator, New Immigrant Survey 
New York University 

Brad Edwards, Westat, Project Director, Early Childhood Longitudinal 
Study-Birth Cohort 

Patty Maher, Associate Director, Data Services Division of Surveys and 
Technologies, University of Michigan Institute for Social Research 

Marjorie Hinsdale, Director of the National Household Survey on Drug Abuse 
Research Triangle Institute (for SAMHSA) 

David Haines, George Mason University 
10:20 a.m. Break 
10:30 a.m. Discussion 

1 1:00 a.m. Technological Innovation and Linguistic Logistics 

(e.g., translation tools , artificial intelligence, feasibility issues) 

Robert Frederking, Senior Systems Scientist, Language Technologies Institute, 

Carnegie Mellon University, 'Current Research in Translating Minority Languages” 

Marilyn Gaska, Manager, Advanced Technology, Lockheed Martin Federal Systems 
TONGUES: Automated Translation of Conversation for the U.S. Army ” 

Kelly Jones Dresen, Director, Translation Department, Comprehensive Language 
Center, Inc., “Theory vs. Practice: Translation , Technology, and Minority Languages” 

Noon Discussion 

12:30 p.m. Lunch 

1:30 p.m. Barriers, Solutions, and Future Directions 

(e.g., costs, logistics, and cultural considerations ; possibilities for cooperation, etc.) 
Chair: Peggy McCardle, NICHD 

Panelists: Katherine Wallman, Chief Statistician, Office of Management and Budget 
Raynard Kington, Director, National Health and Nutrition 

Examination Survey, National Center for Health Statistics, CDC 
Craig Coelen, President, National Opinion Research Center 
Wendy Baldwin, Deputy Director for Extramural Research, NIH 

2:30 p.m. Discussion 

3:00 p.m. Wrap Up 

Robert Santos, Principal Research Associate, Urban Institute 



3:30 p.m. Adjournment 

4:00 p.m. Debriefing with Panelists 



Appendix C. Biographical Sketches of Presenters 



Wendy Baldwin, Ph.D. was appointed National Institutes of Health (NIH) Deputy 
Director for Extramural Research in February 1994. Dr. Baldwin has also served as 
Deputy Director of the National Institute of Child Health and Human Development 
(NICHD) at NIH and as the Chief of the Demographic and Behavioral Sciences 
Branch of NICHD. Dr. Baldwin currently heads a Public Health Service reinvention 
laboratory for the extramural program at NIH. In addition, she has been involved in 
the implementation of the NIH Revitalization Act regarding the inclusion of women 
and minorities in research. Her degrees are in social demography, with special attention 
to issues related to fertility, infant mortality, family, child well-being, AIDS risk behavior, 
and research and statistical methods. 



Richard Bitzer is Lead Assistant Division Chief for Surveys at the U.S. Census Bureau. 

Mr. Bitzer has worked as a Survey Statistician for over 29 years at the U.S. Bureau of the 
Census. He has spent almost all this time in the Field Division. He has had two tours of 
duty at headquarters in Suitland, MD, and one tour of duty in the New York, Boston, and 
Philadelphia Regional Offices. He has worked on the 1980 and 1990 Decennial Censuses 
as well as all the major demographic surveys administered by the U.S. Census Bureau. 

He has an undergraduate degree in mathematics from Millersville University. 

Rebecca L. Clark, Ph.D. is a Program Official with the Demographic and Behavioral Sciences 
Branch of the National Institute of Child Health and Human Development. She manages a 
research portfolio in immigration, internal migration and population distribution, race and 
ethnicity, population and environment, demographic methods, and oversees several of the 
DBSB Population Centers. Before joining NICHD in February 2000, Dr. Clark was a senior 
researcher at the Urban Institute, where she conducted research on impacts of immigrants on 
the United States, Federal expenditures related to children, and other issues related to child 
well-being. Dr. Clark received her Ph.D. in Sociology (Demography) from Brown University 
in 1989. 



Craig G. Coelen, Ph.D., recently appointed President of the National Opinion Research 
Center (NORC) at the University of Chicago, has more than 30 years of experience in the 
management and development of research organizations and a record of extensive research 
on the financing and delivery of health care services. An economist who earned his doctor- 
ate from Syracuse University, Dr. Coelen taught econometrics and macroeconomic theory 
at Northeastern University, Boston, MA. Dr. Coelen moved into research administration 
and project direction in 1975 when he joined Abt Associates in Cambridge, MA, where 
he rose to the position of Senior Vice President and head of the Government Research 
Division. In 1991, Dr. Coelen became Senior Vice President of the Urban Institute in 
Washington, DC, where he served for almost 10 years before accepting the presidency 



of NORC. 
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Kelly Jones Dresen is the Director of Translation and Interpretation Services for the 
Comprehensive Language Center in Arlington, VA. She and her staff oversee translation 
and interpretation projects for the U.S. Government and private industry in more than 
100 languages. Recently, Ms. Dresen managed the translation into 49 languages of the 
Census Bureaus Language Assistance Guide for Census 2000. Ms. Dresen has more than 
12 years of experience in the translation industry and has witnessed the effects of tech- 
nological innovation first-hand. 

Brad Edwards is a Vice President and Associate Area Director at Westat in Rockville, 
Maryland. He directs the Early Childhood Longitudinal Study-Birth Cohort (ECLS-B), 
a major new longitudinal survey for the National Center for Education Statistics (NCES) 
that will shortly begin enrolling a cohort of 15,000 babies born in 2001 and sampled 
from birth records. Data will be collected from the children, their parents and child care 
providers, and (eventually) their schools, using direct child assessment methods combined 
with computer assisted personal interviewing. Mr. Edwards is Westat’s corporate manager 
for two other longitudinal projects, the Kindergarten component of the ECLS (again, for 
NCES) and the Medicare Current Beneficiary Survey for the Center for Medicare and 
Medicaid Services (formerly the Health Care Financing Administration). His current 
research interests include usability issues in computer-assisted data collection systems, 
survey incentives, and methods for including language minorities in surveys. He began 
his survey research career at the National Opinion Research Center, first in Chicago 
and then New York, and also worked for Response Analysis Corporation in Princeton. 

He has a B.A. in Geography from the University of Chicago and participated in an 
Executive M.B.A. program at New York University. 

Robert Frederking, Ph.D. is a Senior Systems Scientist at the Language Technologies 
Institute (LTI) at Carnegie Mellon University and the Chair of LTI s graduate programs. 
He is currently working on several projects in speech translation (TONGUES, LingWear, 
Nespole) and cross-language information access (MuchMore). He was the leader of the 
DIPLOMAT project, which combined CMU’s research in rapid-deployment machine 
translation, speech, and wearable computers to produce a wearable speech-to-speech 
translator that could be adapted quickly to new languages. Dr. Frederking received his 
Ph.D. in Computer Science/Artificial Intelligence from Carnegie Mellon University in 
1986. He has consulted for Carnegie Group Inc. and held research positions at CMUs 
Robotics Institute and Siemens Corporate Research Laboratories in Munich, Germany. 

Marilyn Gaska , Ph.D. is a senior member of the technical staff and Manager of Advanced 
Technology at Lockheed Martin Federal Systems in Owego, NY. She is also the Program 
Manager for Army ACT II BMTS and TONGUES contracts. She has been a technical 
architect on Department of Defense proposals and programs as well as a Principal Investi- 
gator for independent research and development projects involving commercial off-the-shelf 
open enterprise and e-business architectures for both combat support and commercial 
systems. She completed her Ph.D. in Systems Science at Binghamton University in May 1999. 
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David Haines, Ph.D. is an Associate Professor of Anthropology at George Mason 
University. He is the editor of three books on refugees in the United States including 
Refugees as Immigrants (1989), a compilation of the first decade of survey research on 
Southeast Asian refugees. More recently, he has co-edited Illegal Immigration in America 
( 1 999) and Manifest Destinies: Americanizing Immigrants and Internationalizing Americans 
(2000). In addition to immigration issues, Mr. Haines has published work on Vietamese 
social history, policy and operational aspects of governance, and American culture and 
society. He was a research and policy analyst with the Federal refugee program, a Fulbright 
Fellow examining refugee programs in Western Europe, and a senior manager in State 
government before joining the staff at George Mason University. 

Marjorie Hinsdale has been a Survey Director with the Research Triangle Institute (RTI) 
since 1990. Since 1998, she has served as Director of Instrument Assessment and Devel- 
opment for RTFs largest and most complex survey project, the National Household Survey 
on Drug Abuse (NHSDA). She specializes in developing data collection instruments and 
training materials as well as supervising data collection activities for telephone and field 
surveys. She also has acted as the translation reviewer of Spanish documents for numerous 
RTI surveys and has trained Spanish-speaking bilingual interviewers. Prior to working on 
the NHSDA, Ms. Hinsdale served as Project Director for the National Hispanic Enumera- 
tion Survey, a national study conducted annually since 1994 for a commercial client. 

Ms. Hinsdale earned her B.A. in Sociology and Spanish from the University of North 
Carolina at Chapel Hill. 

Guillermina (Willie) Jasso, Ph.D. is Professor of Sociology at New York University. 

Her major research interests are justice analysis, international migration, mathematical 
models for theory building, and factorial survey methods for empirical analysis. Dr. Jasso 
received her Ph.D. at The Johns Hopkins University in 1974. Since then she has served 
on the faculties of Barnard College, Columbia University, the University of Michigan, 
the University of Minnesota, the University of Iowa, and New York University. Dr. Jasso 
also served as Special Assistant to the Commissioner of the U.S. Immigration and Natural- 
ization Service (1977-79) and as Director of Research for the U.S. Select Commission on 
Immigration and Refugee Policy (1979-80). In addition to authoring numerous scientific 
articles, she has served on many advisory boards. She was a member of the National Aca- 
demy of Sciences Panel on the Demographic and Economic Consequences of Immigration 
and of the Core Research Group of the Binational Study of Migration Between Mexico 
and the United States. 
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Raynard Kington , M.D., Ph.D. was appointed the Director of the NIH Office of 
Behavioral and Social Sciences Research in 2000. Prior to this, he was the Director of 
the Division of Health Examination Statistics at the National Center for Health Statistics. 
In that capacity, he served as Director of the National Health and Nutrition Examination 
Surveys, the only nationally representative study of the health of the American people based 
on clinical examination and biologic specimens. He has also been a Senior Scientist in the 
Health Program at RAND, where he was Co-Director of the Drew/RAND Center on 
Health and Aging, a National Institute on Aging Exploratory Minority Aging Center. 

Dr. Kington received his B.S. (with distinction) and his M.D. from the University of 
Michigan, completed his residency in Internal Medicine at Michael Reese Medical Center 
in Chicago, and was appointed a Robert Wood Johnson Clinical Scholar at the University 
of Pennsylvania. While at the University of Pennsylvania, he completed his M.B.A. (with 
distinction) and his Ph.D. with a concentration in Health Policy and Economics at the 
Wharton School. He is board-certified in Internal Medicine, Geriatric Medicine, and 
Public Health and Preventive Medicine. 

Rose Maria Li, Ph.D. is Chief of the Population and Social Processes Branch, 

and Deputy Director of the Office of Research Resources and Development, Behavioral 
and Social Research Program, National Institute on Aging (NIA), National Institutes 
of Health. She is responsible for the scientific management of domestic and international 
research activities in the areas of demography, economics, population epidemiology, 
and health services. She is currently focusing in particular on a number of special areas 
of emphasis: health, work, and retirement; health disparities; healthy life expectancy; and 
linkages between early life influences and later life health. Dr. Li came to the NIA in her 
current capacity in June 1999. Previously, she was a Program Officer with the Demographic 
and Behavioral Sciences Branch of the National Institute of Child Health and Human 
Development (NICHD). Dr. Li received her Masters in Business Administration from 
the University of Chicago in 1986 and earned her doctorate in Public and International 
Affairs from Princeton University in 1992, with a concentration in Population Policy. 

Yvonne Maddox, Ph.D. was named Deputy Director of the National Institute of 
Child Health and Human Development in January 1995. She is also currently serving 
as the Acting Deputy Director of the National Institutes of Health (NIH). Dr. Maddox 
received her doctorate in physiology and biophysics from Georgetown University, and 
has had a wide array of biomedical research and teaching experiences. Throughout her 
academic and Government career, Dr. Maddox has been recognized as a champion of 
womens issues. She plays a vital role in the identification of issues related to women as 
scientists and as participants in research studies at the NIH level as well as at the U.S. 
Department of Health and Human Services level. She is guiding new approaches 
to funding research on innovative high priority areas. 
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Patricia Maher, Ph.D. is the Associate Director for Data Collection and Processing 
Services in the Division of Surveys and Technologies at the University of Michigan 
Institute for Social Research (ISR). In her current position, she is responsible for the 
management and implementation of the Early Childhood Longitudinal Study, Kinder- 
garten Cohort, as well as coordinating the data collection operations within the Division 
of Surveys and Technologies. She has more than 10 years of experience participating in 
and managing complex and large-scale data collection surveys. Dr. Maher has been with 
ISR since 1988, beginning her work in the centralized Telephone Center by recruiting, 
hiring, training, and managing staff. 

Peggy McCardle, Ph.D., M.P.H. is the Associate Chief of the Child Development and 
Behavior Branch of the National Institute of Child Health and Human Development, 
at the National Institutes of Health. In addition to her branch administrative duties, 
she is director of the research program on Language, Bilingualism and Biliteracy 
Development and Disorders. Dr. McCardle holds a Ph.D. in linguistics from the 
Pennsylvania State University, an M.P.H. from the Uniformed Services University 
of the Health Sciences in Bethesda, MD, and certification in speech-language pathology 
from the American Speech-Language-Hearing Association. Dr. McCardle serves as the 
Institute liaison to the National Reading Panel, in addition to leading the development 
of several new initiatives in literacy, including the formation of the Biliteracy Research 
Network, which currently consists of approximately five million dollars of NICHD- 
Department of Education jointly funded research on the development of English 
literacy in children whose first language is not English. 

James W. McNally, Ph.D. is a Senior Research Associate at the Institute for Social Research 
at the University of Michigan, Ann Arbor. He is also the Project Manager for the National 
Archive of Computerized Data on Aging, which is located with the In ter- University Consor- 
tium for Political and Social Research at the University of Michigan. His research interests 
are largely focused on survey methodology and the use of large data sets for secondary analysis. 
He has worked with a number of longitudinal data sets related to aging including SIPP, LSOA 
and NLTCS as well as census data and cross-sectional surveys from a variety of countries. He is 
particularly interested in the repair and enhancement of data and has worked on a variety of 
imputation strategies. Dr. McNally has also done work on migration and public health in the 
United States, Vietnam, Fiji, and the Philippines. Dr. McNally received his B.A. in Anthropol- 
ogy from the University of Maryland, College Park, his M.A. in Applied Demography from 
Georgetown University, and his Ph.D. in Demography and Sociology from Brown University. 
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Thomas E. Perez , J.D., M.P.P. was appointed Director of the Office for Civil Rights (OCR) 
for the U.S. Department of Health and Human Services on February 16, 1999. As Director 
of OCR, Perez is responsible for ensuring that programs and activities receiving funds from 
HHS are in compliance with all civil rights laws. Prior to this appointment, Mr. Perez served 
at the Department of Justice as Deputy Assistant Attorney General for Civil Rights, from 
January 1988 to February 1999. Mr. Perez received an A.B. in International Relations- 
Political Science from Brown University in 1983, a J.D. cum laude in 1987 from Harvard 
Law School, and a Master’s in Public Policy from the John F. Kennedy School of Govern- 
ment in 1987. 

John Ruffin , Ph.D. was appointed the first Director of the National Center on Minority 
Health and Health Disparities at the National Institutes of Health (NIH) on January 9, 
2001. In this role he leads a national program of biomedical research, training and dis- 
semination of information on health conditions disproportionately affecting racial and 
ethnic minorities and other medically underserved populations. Dr. Ruffin is the former 
Director of the NIH Office of Research on Minority Health, NIH. A native of New 
Orleans, Louisiana, Dr. Ruffin received his B.A. from Dillard University and a Masters 
degree from Atlanta University. He earned a Ph.D. at Kansas State University in system- 
atic and developmental biology and then pursued postdoctoral studies at Harvard 
University. Prior to joining the NIH, he was Dean of the College of Arts and Sciences 
at North Carolina Central University. 

Robert L. Santos , M.A. is currently the Executive Vice President and Partner at NuStats 
Partners, LP in Austin Texas. He previously held the position of Principal Research Associ- 
ate at The Urban Institute in Washington, D.C, and Vice President of the Statistics and 
Methodology Division in the National Opinion Research Center at the University of 
Chicago. Mr. Santos has more than 20 years of experience in the survey research industry 
as a sampling statistician, statistician, project director, and senior research administrator. 

He specializes in survey methodology, survey design, and rare element sample designs, 
especially designs related to Hispanic or other minority groups. He is a member of the 
Editorial Board of the Public Opinion Quarterly, holds office as Secretary-Treasurer of 
AAPOR, is a member of the Census Advisory Committee of Professional Associations, and 
holds other offices and committee memberships in the American Statistical Association. 

Gillian Stevens , Ph.D. is Associate Professor of Sociology at the University of Illinois at 
Urbana-Champaign. Her research interests concern immigration and language. She has 
published articles on patterns of ethnic, racial, and linguistic intermarriage, and on patterns 
of language usage, language shift, and English acquisition among immigrants in the United 
States. Dr. Stevens received her Ph.D. in Sociology from the University of Wisconsin-Madison 
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Nathan Stinson, M.D., Ph.D., M.P.H. became the Deputy Assistant Secretary for Minority 
Health and the Director of the Office of Minority Health on August 2, 1999. As Deputy 
Assistant Secretary, Dr. Stinson reports to the Assistant Secretary for Health/Surgeon 
General and works closely with all agencies throughout the Department of Health and 
Human Services (DHHS). Under Dr. Stinsons leadership, the Office of Minority Health 
develops and coordinates Federal health policy that addresses minority health concerns 
and ensures that Federal, State, and local health programs take into account the needs 
of disadvantaged, racial and ethnic populations. Dr. Stinson also oversees regional minor- 
ity health consultants at the ten DHHS regional offices. Dr. Stinson received his B.A. from 
the University of Colorado, his master s degree from the University of California, and his 
Ph.D. from the University of Colorado — all in Environmental Biology. He received his 
M.D. from the University of Colorado Medical School, and his M.P.H. in Health Care 
Administration from the Uniformed Services University of Health Sciences. 

Katherine Wallman currently serves as Chief Statistician at the U.S. Office of Management 
and Budget. In this capacity she is responsible for overseeing and coordinating Federal statis 
tical policies, standards, and programs; developing and fostering long-term improvements 
in Federal statistical activities; and representing the Federal Government in international 
organizations such as the United Nations Statistical Commission. Prior to assuming this 
position, Ms. Wallman served for more than a decade as Executive Director of the Council 
of Professional Associations on Federal Statistics, a coalition of organizations concerned with 
fostering communication among users and producers of Federal statistics and improving 
the utility and accessibility of the Nations statistical resources. Her special interests include 
fostering improved dissemination of and access to Federal statistical information, increasing 
cooperation between the several levels of government in the production of national statistics 
strengthening the interface between academic and government statisticians, and enhancing 
the statistical literacy of the public. 
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Appendix D. Workshop Participants 



Christine Bachrach, National Institute of Child Health and Human Development 

Wendy Baldwin, Office of Extramural Research, NIH 

Angela Caroline Bates, Office of Research on Womens Health, NIH 

Daniel Berch, National Institute on Aging 

Richard L. Bitzer, U.S. Census Bureau 

Carol Briggs, U.S. Census Bureau 

Debra Brody, Centers for Disease Control and Prevention 
Natasha Cabrera, National Institute of Child Health and Human Development 
Virginia Cain, Office of Behavioral and Social Sciences Research, NIH 
Alfredo Calvillo, Centers for Disease Control and Prevention 

Olivia Carter- Pokras, Office of Minority Health, Office of the Assistant Secretary for Health 
Yinong Chong, National Health and Nutrition Examination Statistics 
Centers for Disease Control and Prevention 
Rebecca L. Clark, National Institute of Child Health and Human Development 
Craig Coelen, National Opinion Research Center, University of Chicago 
Kelly Jones Dresen, Translation and Interpretation Department 
Comprehensive Language Center, Inc. 

Brad Edwards, Early Childhood Longitudinal Study Birth Cohort, Westat 
Sumru Erkut, Center for Research on Women, Wellesley College 
Robert Frederking, Language Technologies Institute, Carnegie Mellon University 
Marilyn Gaska, Lockheed Martin Federal Systems/Owego 

David Haines, Departments of Sociology and Anthropology, George Mason University 
J. Taylor Harden, National Institute on Aging 

Marjorie Hinsdale, National Household Survey on Drug Abuse, Research Triangle Institute 
Michael W. Horrigan, National Longitudinal Survey Program, Bureau of Labor Statistics 
Guillermina Jasso, Department of Sociology, New York University 

Joel Kennet, National Center for Health Statistics, Centers for Disease Control and Prevention 
Raynard Kington, Division of Health Examination Statistics, Centers for Disease Control 
and Prevention 

Rose Maria Li, National Institute on Aging 

Yvonne Maddox, Office of the Director, NIH and National Institute of Child Health and 
Human Development 

Patricia Maher, Survey Research Center, University of Michigan 
Edith McArthur, Office of Educational Research and Improvement 
National Center for Education Statistics 
Peggy McCardle, National Institute of Child Health and Human Development 
James McNally, The National Archive of Computerized Data on Aging 
University of Michigan 
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Richard Nakamura, National Institute of Mental Health 

Thomas E. Perez, Office for Civil Rights, U.S. Department of Health and Human Services 
Michael Pergamit, Economic Studies, National Opinion Research Center 
John Ruffin, Office of Research on Minority Health, NIH 
Robert L. Santos, Urban Institute 

Susan Schechter, Statistical Policy Office, Office of Management and Budget 
Belinda Seto, Office of Extramural Research, NIH 

Gillian Stevens, Department of Sociology, University of Illinois at Urbana-Champaign 
Nathan Stinson, Office of Minority Health, Office of the Assistant Secretary for Health 
Richard Suzman, National Institute on Aging 
Katherine Wallman, Office of Management and Budget 
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